A Conditional Random Field-based Traditional Chinese Base Phrase Parser for SIGHAN Bake-off 2012 Evaluation
نویسندگان
چکیده
This paper describes our system for the subtask 1 of traditional Chinese Parsing of SIGHAN Bake-off 2012 evaluation. Since this research mainly focuses on speech recognition and synthesis applications, only base phrase chunking was implemented using three Conditional Random Field (CRF) modules, including word segmentation, POS tagging and base phrase chunking sub-systems. The official evaluation results show that the system achieved 0.5038 (0.7210/0.387) microand 0.5301 (0.7343/0.4147) macro-averaging F1 (precision/recall) rates on full sentence parsing task. However, if only the performance of base phrase chunking was considered, the Fmeasures may be around 0.70 and is somehow good enough for speech recognition and synthesis applications.
منابع مشابه
Sentence Parsing with Double Sequential Labeling in Traditional Chinese Parsing Task
In this paper, we propose a new sequential labeling scheme, double sequential labeling, that we apply it on Chinese parsing. The parser is built with conditional random field (CRF) sequential labeling models. One focuses on the beginning of a phrase and the phrase type, while the other focuses on the end of a phrase. Our system, CYUT, attended 2012 the second CIPS-SGHAN conference Bake-off Task...
متن کاملConditional Random Field-based Parser and Language Model for Tradi-tional Chinese Spelling Checker
This paper describes our Chinese spelling check system submitted to SIGHAN Bake-off 2013 evaluation. The main idea is to exchange potential error character with its confusable ones and rescore the modified sentence using a conditional random field (CRF)-based word segmentation/part of speech (POS) tagger and a tri-gram language model (LM) to detect and correct possible spelling errors. Experime...
متن کاملNCTU and NTUT's Entry to CLP-2014 Chinese Spelling Check Evaluation
This paper describes our Chinese spelling check system submitted to SIGHAN Bake-off 2014 evaluation. The system’s main components are still the conditional random field (CRF)-based word segmentation/part-ofspeech (POS) tagger and tri-gram language model (LM) used last year. But we tried to refine the misspelling rules, decision-making threshold and improve LM rescoring speed to reduce false ala...
متن کاملChinese Word Segmentation based on Mixing Multiple Preprocessor and CRF
This paper describes the Chinese Word Segmenter for our participation in CIPSSIGHAN-2010 bake-off task of Chinese word segmentation. We formalize the tasks as sequence tagging problems, and implemented them using conditional random fields (CRFs) model. The system contains two modules: multiple preprocessor and basic segmenter. The basic segmenter is designed as a problem of character-based tagg...
متن کاملTraditional Chinese Parsing Evaluation at SIGHAN Bake-offs 2012
This paper presents the overview of traditional Chinese parsing task at SIGHAN Bake-offs 2012. On behalf of task organizers, we describe all aspects of the task for traditional Chinese parsing, i.e., task description, data preparation, performance metrics, and evaluation results. We summarize the performance results of all participant teams in this evaluation, in the hope to encourage more futu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012